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Abstract 

We consider many-body problems in classical mechanics where a 
wide range of time scales limits what can be computed. We apply 
the method of optimal prediction to obtain equations which are eas- 
ier to solve numerically. We demonstrate by examples that optimal 
prediction can reduce the amount of computation needed to obtain a 
solution by several orders of magnitude. 
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1 Stiff oscillatory mechanics 

There are many problems in classical mechanics where what can be computed 
is limited by the simultaneous presence of both fast and slow motion: some 
variables oscillate rapidly while others change slowly, so standard numerical 
methods can require a large number of time steps to give accurate answers. 
Stiffness of this type limits calculations of planetary motion, drift in high- 
frequency electronic oscillators, and the dynamics or large molecules |]l|. 

For instance, in molecular dynamics it is standard |0] to model the motion 
of many atoms as a mechanical system with a Hamiltonian of the form 



where {qj,Pj) are the coordinates and momenta of the atoms and is the 
number of atoms, commonly in the range 10^ to 10^. Here V denotes a 
smoothly-varying potential energy of interaction among coordinates, the g^s 
are bond angles or interatomic spacings (functions of the coordinates), the 
m's are masses, and A is a matrix of spring constants. Such models are used 
to describe both the large-scale motion that takes place over milliseconds and 
also the rapid vibrational motions at chemical bonds which are measured in 
terahertz. 
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In a recent paper |]^, Stuart and Warren considered a particular stiff 
Hamiltonian problem of the form ([1|) that was originally meant to model 
a particle interacting with a heat bath [Q, and they constructed numerical 
schemes that worked well with large time steps. They were able to compute 
the motion of slowly-varying quantities accurately, even when most of the 
dynamics was grossly underresolved in time (i.e., even when their time step 
was much longer than the periods of most normal modes of oscillation). 

This observation, that a scheme may be optimized to work well even when 
the resolution is poor, is similar to the results of optimal prediction p, |^, |^ ; 
optimal prediction is a method for reducing the resolution required to solve a 
large system of equations. A smaller system is constructed, designed to yield 
expectations of solutions of the larger system and to be computationally 
practical even when the larger system is not. Since Stuart and Warren have 
found schemes for some large, stiff systems that work with big time steps, it 
is natural to ask whether there are smaller systems of differential equations 
(just describing the slower modes) that would work at these big time steps. 

In this paper we show how optimal prediction may be applied to a class 
of large, stiff Hamiltonian systems like (0) to yield effective equations which 
are smaller and slower. We demonstrate the method on the Stuart- Warren 
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model and on a generalization of it that more closely approximates realistic 
models of molecular dynamics. The benefits are longer time steps, lower 
dimensionality (hence fewer force evaluations per time step), and a systematic 
approach that may may be broadly applied. 

2 Optimal prediction 

Optimal prediction is a method that takes a large system of differential equa- 
tions together with a probability distribution for the dependent variables, 
and produces a smaller system of equations for the expectations of some se- 
lected variables while averaging over all the others. The method is described 
in §|, 0. Error bounds for the method can be found in 
Suppose we are given a large dynamical system 

iii = Ri{ui, . . . ,un), i = l,...,N (2) 

for dependent variables ui, . . . ,un, and we are also given a normalized prob- 
abihty density P{ui, . . . ,un) which is invariant under (Q), 

^ dP 

^-g;^Rj{ui,...,UN) = 0. (3) 

j=l ■? 

The first step in the optimal prediction procedure is to identify "collective 
variables," meaning a small number of functions of the dependent variables 
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whose evolution we would like to predict. We denote these collective variables 
by f . . . Vn{u) where n < N. The idea in optimal prediction is to treat the 
u's as random, treat their combinations in the v^s as known, and to estimate 
the rates of change of the w's by conditional expectations. 

One writes out a formula for the rate of change of the v's induced by (^, 

^ dv 



5^ o-^^iK,--- ,«A^) (4) 



i=i ^ 



Then one uses P{u) to compute the expectation of this expression subject to 
conditions that ^^(m) = for some n numbers Vi . . .Vn, 



•j^{u) P{u) W6{vy{u) -v^) du 

/ -PI"") -v^) du 



Finally, one hypothesizes that the mean evolution of the f 's is approximated 
by the solutions v^{t) of the new system, 

I ^ dv \ 
v,{t) = {y.^R,{u,,...,u^)) (6) 

The new system @ is a closed system of equations for the u's, and it is 
n-dimensional instead of A^-dimensional. 

Equation (Q) approximates the evolution of the mean values of the f 's. 
The idea of the approximation is that at every moment in time, the m's 



are distributed according to their invariant probability density subject to 
conditions on the values of collective variables. All that changes in time is 
the conditions, according to our hypothesis Actually, if the f 's were given 
and the m's were distributed according to a conditioned invariant distribution 
at time t = 0, then at a future time t > the f 's would be indeterminate 
and the u's would become distributed in some more general way. Average 
values of the f 's at all times t > would still be well-defined though, and 
they would be determined by the values of the v's at t = 0. The system (H) 
is meant to approximate such exact mean evolutions of collective variables 
from initial values. 

Although equation (|^) is conjectural, some general results are known 
about its accuracy. First, it clearly gives an asymptotically exact predic- 
tion of mean futures for short times. Second, it appears in an exact for- 
mula for mean futures due to Zwanzig (recently studied by others [p!6| ) 
which reveals corrections in terms of history integrals and noise-like functions 
which are statistically uncorrelated with the collective variables. Third, er- 
ror bounds for the method have been established in the case of Hamiltonian 
dynamical systems 0. 

There are two technical challenges in the application of (|^): collective 
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variables must be selected, and the conditional expectations on the right- 
hand side must be explicitly evaluated, usually requiring approximations 
of the integrals in equation (H). Both steps are critical to accuracy. In 
complex problems, therefore, the best way to determine the usefulness of the 
approximation (j^) is empirically: one generates large random ensembles of 
initial conditions for (|^), integrates each initial condition, then averages the 
results to determine a mean future. One then compares the answer to an 
integral of (|^). 

In the present paper, we will consider Hamiltonian equations where the 
dependent variables are canonical coordinate pairs {qi,pi) ■ ■ ■ {qN,PN)- Hamil- 
tonian equations preserve the canonical probability density, , so we will 
use this as our probability density. We assume that the first n coordinate 
pairs . . . {qn,Pn) are of interest, and we will take the remaining dy- 

namical variables as random. 

The optimal prediction procedure is to take the full system of Hamilton's 
equations. 



discard the equations with indices j > n, and replace the right-hand sides of 
the remaining equations with their expectations with respect to condi- 
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dH 



Pj 



dH 



j = l,... ,iV, 



(7) 



tioned by the selected variables: 



.dH\ . / dH\ 



where (■)„ denotes the conditioned expectation, 
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{f)n = Z^ Yl dqj dpj e ^f{qi,... ,qN]Pi,... ,Pn) (9) 

j=n+l 

with Z a normalization constant. For any function / of the canonical vari- 
ables, (/)„ is a function of gi---g„, Pi---Pn only, so the 2n-dimensional 
system of equations (||) is closed. 

The reduced system (|^), the first approximation in optimal prediction, 
defines an approximate solution to a Liouville problem for the evolution of a 
probability measure on phase space. At least for short times, the system (|[) 
is guaranteed to give the expectations of the selected variables, averaging 
over all possible initial data for the discarded variables. 

We need to evaluate the conditional expectations in (|^). This is easy 
if is a Gaussian distribution (i.e., if H is quadratic, or equivalently if 
the equations of motion are linear). If is not Gaussian, perturbative 
techniques are available to approximate its expectations by Gaussian expec- 
tations. Thus the following results for Gaussian distributions will be sufficient 
for our purposes, see [^, |, ^ for details. 



Let Xi, . . . ,X]\f be Gaussian random variables distributed with density 

(-^ N N N \ 

J2bjXj]. (10) 
j=l k=l 3=1 J 

We denote expectations with respect to this density by (•), and (xj) = 
Y^^=i-^Tj'^j- ^ow suppose that Xi...Xn are given for all n < N. The 
conditional expectations of . . .xn conditioned by denoted 
{xi)n-i i = n + 1, . . . , iV and are given explicitly by 

n n 

{Xi)n = {xi) + A> ^;-'(^- - (^-)), t = n+l,...,N (11) 

/i=l u=l 

where M^^ — A~l for i/ = 1, . . . , n and is the inverse of the n x n 
(not N X N) matrix M. 

The conditioned covariances, Cov„(xi, Xj) = {xiXj)n — {xi)n{xj)n are given 
in terms of the unconditioned expectations Coy{xi,Xj) — (xiXj) — {xi){xj) 

by 

n n 

Cov„(x,, x^) = Cov(x,, x^) ^i^M-^^zh (12) 

/i=i 1^=1 

The conditioned expectation of any polynomial m. xi . . .x^ may be found 
from these formulae by Wick's theorem. 
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3 Generalizations of the Stuart- Warren ex- 
periments 

Stuart and Warren [§] (see also 0) 0; and |10|) considered a one-dimensional 
collection of particles connected by springs. There was one distinguished 
particle with mass 1, coordinate Q and momentum P. The distinguished 
particle was connected by springs of spring constant k to N other particles 
with masses k/j"^, coordinates qj and momenta pj, j = 1 . . . N, representing 
a heat bath. 

The motion of this collection of particles and springs is defined by the 
Hamiltonian 



H{Q,P;qi,. . . ,qN;pi,... ,Pn) 



1 ^ 



A_ + luQ - q,f 



(13) 



where {Q,P) and {qj,Pj) are canonically conjugate dynamical variables for 
j = 1, . . . ,N and rrij = k/j^. The equations of motion are 

N 

Q = P p = .v'iQ) + kJ2i<lj~Q) 

j=i (14) 

Qj = Pj/^j Pj = HQ -Qj)^ j = 1, • • • , ^ 

This system is of the form (0) (with an extra pair of coordinates {Q,P)), 

and it is chosen so that fast and slow motion are separated: lighter particles 
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will move faster, heavier particles will move slower, and the mass rrij goes 
down as j goes up. 

A central result of P] is that if all the heat bath particles start out ran- 
domly, with statistics determined by the canonical distribution, then in the 
limit N ^ oo the coordinate of the distinguished particle obeys the stochastic 
equation, 

Q + ^Q + ^'(Q)-|Q = i^ (15) 

where F{t) is a stochastic process related to white noise. This equation 
for Q is remarkable because it makes no reference to the history of Q — it 
is a differential equation, not an integro-differential equation. In a general 
Hamiltonian problem, if one variable Q is fixed initially and the others are 
random, at future times there is no time-invariant relationship among the 



expectation of Q and its time derivatives |Tl|, |T^, |T3[ . The first approximation 
of optimal prediction may be characterized as the assumption that the 
values of the selected variables do determine their own future expectations. 
In general this assumption is not exactly true, but in the Stuart- Warren 
model it is true exactly in the N oo limit. 

Stuart and Warren proceeded to integrate their model with large time 
steps. If Q were fixed, then each qj would oscillate harmonically with fre- 
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quency uj = j. This implies that a discretization of the 2N+2 equations (H) 



would be resolved in time if NAt ^ 1. If this condition on At were violated, 
then the result of the computation would depend on how the equations were 
discretized. The intriguing result of is that some schemes will give the 
right evolution for Q and P when NAt > 1 and others will not. For instance, 
if the scheme is 

/nn+l /nn pn+1 pn 

i=i (16) 

n+1 _ n »"+'^ — 

' ' =p]^'/m, ^-^^-fL = k{Q- - qj) j = l,...,N 

then cr = (a symplectic method) gives the right answer for Q and P, but 
£7=1 (another convergent method) does not. 

For concreteness, we pick V{Q) = \Q'^. Since H in (|T^) is then quadratic, 
the canonical probability density is Gaussian, and formula (|TTp gives the 
conditioned expectations as 

(g,)n = Q, (p,)n = {n<3<N). (17) 



Taking the conditional expectations of the right-hand sides of ([T^) and eval- 
uating them using these results, we find that the equations of optimal pre- 
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diction are 

n 

Q = P P = -Q + ky2{q^-Q) 

U (18) 

cif^=Pf^/m^ Pt^ = HQ - Qf,), /i=l,...,n 
These are identical in form to the original equations ([141) . It comes as no 

surprise, therefore, that the motion of Q can be computed with large At: 

pick the At desired, find an n -C such that nAt <C 1, and perform a 

resolved integration of ([T8|) with this n and At. Reasonable approximations 

for the selected variables are guaranteed, at least for short times. 

Figure ^ shows a fully-resolved calculation {NAt = 10"^) of P(t) starting 

from P(0) = 0, Q{0) = 1.5, with ^^(O) and Pj{0) chosen randomly from the 

canonical ensemble (i.e., chosen with probability density e~^) conditioned by 

Q{0) and -P(O). It also shows the solution to the same problem as computed 

by a resolved integration of (plSf), which was achieved with nAt = 10~^. 

The optimal prediction calculation accurately duplicates the low-frequency 

behavior of the exact solution, and it does so in fewer dimensions with a 

larger time step. In this case, with N = 10'^ and n = 10^, the optimal 

prediction curve was about 10, 000 times faster to compute than the resolved 

solution. The optimal prediction has the further advantage that it did not 

use the initial data gn+i(0) . . . gAr(O), p„+i(0) . . .pAr(O) and may claim to be 
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an average answer over all possible values of these data. 

4 More general models 

Realistic applications, such as molecular dynamics, involve more complex 
interactions than are present in the model (p!4D. In particular, we may expect 
that every particle would interact with every other, and that the interactions 
would be nonlinear. 

We therefore consider a generalization of the model (|T4D where every 
qi . . .qN is coupled to every other gi . . . by a spring, and the springs are 
nonlinear: 

H{qi, ... ,qN;pi,... ,Pn) 

N 2 ^ N N N N 

= E ^ + E E (ft - i-f + i*'^' E E (ft - i-r (") 

j=i ^ j=i i=j+i j=i i=j+i 

qj =Pj/mj 

N N 

p, = - - fc(^) ^^(g, - q,f 

1=1 1=1 

This model makes no reference to a distinguished particle; each one of the N 
particles interacts with all of the others through the same potential energy, 
which is parameterized by the new spring constants A;^^-* and k^^\ 

We derive the optimal prediction equations of the system (^) for gi . . . 
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Pi ... p„ by averaging over . . .Qn, Pn+i ■ ■ - Pn- Since the interactions are 
now nonlinear, the probability density e~~^ is no longer Gaussian, so we must 
work harder to evaluate the conditioned expectations. 



Hald has observed, as reported in |16|, that optimal prediction equations 



of the form (§) are always Hamiltonian, and that their Hamiltonian is 



log(/" n dq,dp,e-A. (21) 



We may therefore approximate the conditioned expectations of (P) by first 
approximating H', and then deriving (^ by differentiation: 

dH' . dH' 

dp^ dq^ 

We decompose H into its quadratic part plus its higher-order part. 



H = Ho + Hi 

^ „ i.(2) ^ ^ 



j=i '^^3 j=i i=j+i (23) 



2mj 2 

N N 

4" 



j=i i=j+i 

and proceed by determining H' perturbatively as a power series in k^^\ An al- 
ternate method for perturbative treatment of optimal prediction is described 
in [|15 . 
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Hald's formula (^Tj) implies 



j=n+l / \ I Uf=n+1 dpj 



H' = -logU n dq^dp^e-^'A -log 

V j=n+l ' 

= (i7o-part)-log(e-^^)^_^ 



Ho 



(24) 

where the new average, (■)n,o denotes an average with respect to the condi- 
tioned Gaussian measure, defined just as in the definition (^) but with Hq 
replacing H . The "(ifo-part)" term would be the effective Hamiltonian if Hi 
were zero, and it contributes linear terms to the equations of motion which 
are easily evaluated by the regression formula ([Tl|) . The other term in (24) 
is equal to a power series in k^'^\ 



CO /^-j^ 



log(e-^On,o = E^(^r)So (25) 



ml 

m=l 



where (if{")^'^g denotes the m-th cumulant of Hi with respect to the condi- 
tioned Gaussian measure. Each cumulant in this series may be evaluated by 
Wick's theorem, where only "connected" pairings (in the sense of perturba- 
tion theory in physics) are included. 
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To first order in k^^\ we need to evaluate 

N N 



4 

4 



n n 



n n 



- (constant) 
(26) 

where "(constant)" denotes terms that are independent of gi, . . . , g„ and 
Pi-i ■ ■ ■ ,Pn (and therefore do not affect equations of motion). The average 

(•)„^o may be deduced from the expectations, 

1 " 

{qj)n,o = - E 



Covo{qj,qi) 



1 

AW 



j,l^n + l,... ,N 



(27) 



(1 + Sji) 



together with Wick's theorem. The result for H', to first order in k^'^\ is 

n 2 „ n n 



a 



fj,= l ix=v+l 
n n 



+ xE Efa.-'jJ' 



(28) 



/U=l ix=v+l 



+te(«-4E''")* 



iVwo^ „(iV-n)(n+ 1) 



where the coupling constants to this order in k^^^ are 

C, = ^k^^) + 3 
n 



Nn 



(29) 



= k^^\N-n) 
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We differentiate PS| ) to obtain the optimal prediction equations for the new 

system to 0{k^^^)\ 



= -C2 ^{Qf^ - qu 

1 " 



n 



qu) 



fl = l, 



, n. 



(30) 



We performed a more rigorous test of this new model, comparing it to an 
actual mean evolution. The results are shown in Figure |^. We once again 
picked gi . . pi . . .p„ (n = 10) from the canonical distribution for 

particles (A^ = 1000 at A;^^^ = 1 and /c^^^ = 0.1). We then generated an 
ensemble of 100 sets of values for . . . gTv, Pn+i ■ ■ - Pn from the canonical 
distribution conditioned by gi . . . g„, Pi ■ ■ - Pn, and for each set integrated 
the equations (^). Averaging over all 100 solutions yielded the solid curve 
for pi(t). We then discarded the ensemble and used the original gi . . . 



Pi . . . p„ as initial conditions for the reduced system (|30D , which we integrated 
with At = 10~'^/n = 1/N. This At is small enough to resolve the reduced 
dynamics but much too large to resolve the original dynamics. The solution 
for pi{t) from ( pOD is the dashed curve. Finally, for comparison we performed 
the naive experiment of simply truncating the big system (pO]) to n degrees 
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of freedom, effectively ignoring the lighter particles without changing the 
interactions. This produced the dot-dashed curve. 

The figure shows that the reduced system accurately predicts the average 
evolution of pi{t), and it does so with 1 percent of the degrees of freedom 
and time steps that are 100 times larger. The naive experiment shows that 
the new couplings are critical to the answer. Since forces must be evaluated 
N{N — l)/2 times per time step for N particles, optimal prediction speeds 
up the calculation of in this case by about a factor of about 10^. 

5 Conclusions 

We have shown that optimal prediction may be applied to large, stiff Hamil- 
tonian systems of differential equations to make new systems that are smaller, 
better-conditioned, and approximate the original equations in the mean. We 
have demonstrated that the method gives accurate answers while allowing 
larger time steps and requiring fewer force evaluations. 
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Figure 1: The evolution of P{t) determined in two ways: by solving the 
equations of motion (|T^ with N = 10, 000 particles and random initial data 
(exact evolution, At = 10~^/A^); and by solving the reduced equations (|30D 
with n = 100 particles and a time step 100 times longer (optimal prediction. 
At = 1/A^ = 10~^/n). For these calculations, kQ = kg = 1. 
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Average p (t), new model, N=1000, n=10 
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Figure 2: The average evolution of pi(t) determined in three ways: by solv- 
ing the equations of motion ( pO]) for 100 different initial conditions, with 
N = 10^ particles, At = 10~'^/N, and then averaging all 100 solutions (mean 
evolution); by solving the reduced equations ( |5DD once, with n = 10 particles. 
At = 1/A^ = 10~'^/n (optimal prediction); and by solving the original equa- 
tions (^) once with N = 10, At = 10~^/A^ (naive prediction, just neglecting 
interactions with discarded variables). 



23 



